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Abstract 

We give an algorithm, with R code, to minimize the multidimensional scaling stress 
loss function under the condition that some or all of the fitted distances are larger than 
given positive lower bounds. This paper is a companion to De Leeuw (2017). We also 
give some interesting majorization theory. 
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1 Introduction 

In this paper we study the minimization of stress, defined as 

inn 

stress(X) := - ” d iA X )f 

4 i= 1 3 = 1 

over n x p configuration matrices X. Here IT = {?%■} and A = {hp} are given symmetric, 
hollow, and non-negative matricesof, respectively, weights and dissimilarities. The difiX) are 
Euclidean distances between the points with coordinates in the rows of X. Thus 

d%j(X) := (xi - xfijfixi - xj) = (e; - efi'XXfiei - efi) = tr X’A l3 X. 

Here e % and e 3 are unit vectors, with all elements equal to zero except for the single element i 
or j, which is equal to one. Also A VJ := (a — efi){ei — e 3 )'. 

The problem of minimizing stress is well-studied (see, for example, Borg and Groenen (2005)). 
In this paper we analyze the problem of minimizing stress over all configurations X for which 
dij(X) > otij for some or all pairs i, j, where the a % j are positive bounds. As a special case 
we have MDS from above, where we require d l3 {X) > Si 3 for all i , j. 

These optimization problems are non-standard in various ways. The constraints difiX) > a, 3 
dehne a reverse convex set (Meyer (1970)) in configuration space, and in addition stress is 
not convex. We can, however, use basic smacof theory (De Leeuw (1977)) to majorize stress 
by a convex quadratic and to majorize the constraints by linear functions. Majorization 
can be used to dehne a simple iterative algorithm. In each iteration we majorize stress 
and the constraints, and then minimize the convex quadratic majorizer over configurations 
satisfying the majorized linear constraints using quadratic programming. Compute a new 
majorization at the minimizer of the majorized problem, and so on. These are the two steps 
in the majorization or MM approach to optimization (De Leeuw (1994), Lange (2016)). 

2 Some Majorization Theory 

Suppose the problem we are interested in is to minimize a real-valued /o over all x G M n that 
satisfy fj(x) < 0 for all j = 1, • • • , m. We call this problem V. 

Now suppose g 3 : M n ® —> M majorizes fj, i.e. 


fj(x ) < 9j{x,y ) for all x,y e M n , 
fj(y ) = 9j(y,y ) for all y G M n . 

To simplify matters we suppose all fj and g 3 are continuously differentiable, and all g 3 are 
convex in their first argument. Dehne problem Vk as the problem of minimizing gfix, ad fc_ L) 
over the x G M n that satisfy gj{x,x ( ' k ~ l fi < 0. Dehne a sequence x^ with feasible for V, 
and with x ^ for k > 1 a solution of Vk- 

Theorem 1: [Convergence] 
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1. x ^ is feasible for V for all k > l. 

2. f 0 (x ^) < for all k > 1. 

Proof: For the first part fj(x^) < gj{x^ k \x^ k ~ _1 )) by majorization, and g 3 {x^ k \x^ k _1) ) — o 
by feasibility for Vk- For the second part fo(x^) < go(x^ k \x^ k ^) by majorization. Because 
xW solves Vk we have g 0 (x^ k \ < g 0 (x ( ' k ~ 1 \x < ' k ~ 1 ' 1 ) if is feasible for Vk, he. if 

gj(x^ k ~ 1 \ x^- 1 )) < 0. But gj(x^ k ~ 1 \x^ k ~ 1 ^ > ) = fj(x^ k ~^) < 0 by the first part of the theorem. 

QED 


3 Fixed Points 

By theorem 1 we have a sequence of points with decreasing function values that are all 
feasible for V. We can say a bit more about accumulation points of this sequence. 

Define V(y) as the convex problem of minimizing go(x,y) over x G M n and gj(x,y ) < 0. 

Theorem 2: [Necessary] 

If x solves V(x) then x satisfies the first order necessary conditions for a local minimum of V. 

Proof: The necessary and sufficient conditions for a; to be a minimum of V(y) are that there 
exists A > 0 such that 

m 

vMx, y) + J2 V p i y ) = o, 

3=1 

with in addition g 3 (x,y) < 0 and A jg 3 (x,y) = 0. 

The first order necessary conditions for i to be a local minimum of V are that there exists 
A > 0 such that 

m 

'Dfo(x) + ^2 XjVfj(x) = 0, 
i=i 

with in addition fj(x) < 0 and A jfj(x) = 0. 

But if the fj and g 3 are differentiable, we know from majorization theory (De Leeuw (2016), 
Lange (2016)) that Vfj(x) = g 3 {x,x) for all x. 

QED 

It follows that if we define V(y) as the solution of V(y) then hxed points of T are local 
minimizers of V. Thus if x^ converges to a fixed point of V, it converges to a local minimizer 
of V, and fo(x ^) converges to a local minimum. 


4 MDS with Lower Bounds 


We have see in the companion paper (De Leeuw (2017)) that we can write 

stress(a:) = 1 — p(x) + -x'x, 
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where 


p{x) '■= J2 w ^kd k (x), 

k= 1 

with distances dk(x) := \Jx' A k x , and the H*, positive semi-dehnite matrices that add up to 
the identity. Define 

B ( x ) := 

fc=i a k\x) 

and define the Guttman transform, if r as r(r) := B(x)x, then for all x 

stress(x) = 1 — x'r(x) + ^ x'x = (1 — ^r(x)T(a;)) + — r(x)) / (a; — T(x)), 

and for all x, y we have the majorization 

stress(a;) < 1 - x'T(y) + * x'x = (1 - ^T(y)T(y)) + * (z - Y{y)\x - T(y)), 

The constraints are y/x'Aj-x > a k or a k — y/x' A k x < 0. The linearized constraints are 


ot " vm> x ' Aky - 0 

Thus the quadratic programming problem we have to solve in each smacof iteration is 
minimization of 1 — x T(aj( fc - 1 )) + \x'x over x satisfying 

x , A k x {k ~ 1) > a k d k (x {k ~ 1} ). 

We use the qp function from the lsei package (Wang, Lawson, and Hanson (2015)). 


5 Examples 

5.1 Equidistances 

Our first example has n = 10 with all S t] equal. The optimal smacof solution in two 
dimensions needs 131 iterations to arrive at stress 0.1098799783. We reanalyze the data with 
the constraint that the distance between the first point and all nine others is at least one. 
The algorithm incorporating these bounds, implemented in the function smacof Above, uses 
157 iterations and finds stress 0.1340105192. The two configurations are in figure 1. 
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Figure 1: Equidistance Data, Regular Left, Lower Bounds Right 
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Figure 2: Equidistance Data, Regular Left, Lower Bounds Right 


5.2 Dutch Political Parties 1967 

As the next illustration we use data from De Gruijter (1967), with average dissimilarity 
judgments between Dutch political parties in 1967. The data are 

## KVP PvdA VVD ARP CHU CPN PSP BP 

## PvdA 5.63 

## VVD 5.27 6.72 

## ARP 4.60 5.64 5.46 

## CHU 4.80 6.22 4.97 3.20 

## CPN 7.54 5.12 8.13 7.84 7.80 

## PSP 6.73 4.59 7.55 6.73 7.08 4.08 
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## BP 7.18 7.22 6.90 7.28 6.96 6.34 6.88 

## D66 6.17 5.47 4.67 6.13 6.04 7.42 6.36 7.36 


The optimal smacof solution in two dimensions needs 319 iterations to arrive at stress 
0.044603386. Approximation from above, where we require that all distances are at least as 
large as the corresponding dissimilarties, uses 30 iterations and finds stress 0.2801306914. At 
the solution there are 15 active constraints, but as the Shepard plots in figure 4 show, the 
active constraints now do not correspond with the smallest dissimilarities, although they do 
correspond to the smallest distances. 
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Figure 3: De Gruijter Data, Regular Left, From Above Right 




Figure 4: De Gruijter Data, Regular Left, From Above Right 


In the next analysis we require that all distances are larger than or equal to 3.2, the minimum 
dissimilarity. In order words, the smallest distances must be larger than or equal to the 
minimum dissimilarity. 
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The optimal solution under these constraints in two dimensions needs 128 iterations to arrive 
at stress 0.0509159458. 



Figure 5: De Gruijter Data, Distances Bounded by Minimum Dissimilarity 

And finally an analysis in which we require that distances between ARP, CHU, KVP (the 
Christian Democrat parties) are larger than 5, and the distances between PvdA, PSP, CPN 
(the leftist parties) are also larger than 5. This forces them apart. 

The optimal solution under these constraints in two dimensions needs 167 iterations to arrive 
at stress 0.0807378807. The six constrained distances are 5, 5, 5, 7.8645711944, 5.0000000003, 
5.0000000001, so the Christian Democrats, for example, are on an equilateral triangle with 
side 5. 
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Figure 6: De Gruijter Data, Imposing Large Distances 
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6 Discussion 


The obvious next step is to combine the algorithms of De Leeuw (2017) and this paper to 
implement both upper and lower bounds on the distances. Of course for that case we have to 
be careful that a solution actually exists. 

Another next step is to see in how far these bounding algorithms are useful for both metric 
and non-metric unfolding. 

7 Appendix: Code 

7.1 above.R 

suppressPackageStartupMessages (library (mgev, quietly = TRUE)) 

suppressPackageStartupMessages (library (lsei, quietly = TRUE)) 

source ( "auxilary .R") 

source ( "mdsUtils .R") 

source ("smacof .R") 

source ( " smacofAbove. R" ) 

7.2 auxilary. R 

mprint <- function (x, 

d = 6, 
w = 8, 
f = "") { 

print (noquote (formate ( 
x, 

di = d, 
wi = w, 
fo = "f", 
flag = f 
))) 

> 

directSum <- function (x) { 
m <- length (x) 
nr <- sum (sapply (x, nrow)) 
nc <- sum (sapply (x, ncol)) 
z <- matrix (0, nr, nc) 
kr <- 0 
kc <- 0 

for (i in l:m) { 
ir <- nrow (x [ [i] ]) 


ic <- ncol (x[ [i]]) 

z [kr + (l:ir), kc + ( 1: ic)] <- x [ [i]] 
kr <- kr + ir 
kc <- kc + ic 

} 

return (z) 

> 

repList <- function(x, n) { 
z <- list() 
for (i in l:n) 

z <- c(z, list(x)) 
return(z) 


shapeMe <- function (x) { 
m <- length (x) 

n <- (1 + sqrt (1 + 8 * m)) / 2 
d <- matrix (0, n, n) 
k <- 1 

for (i in 2:n) { 

for (j in 1: (i - 1)) { 

d[i, j] <- d[j, i] <- x [k] 
k <- k + 1 

> 

} 

return (d) 


symmetricFromTriangle <- function (x, lower = TRUE, diagonal = TRUE) { 
k <- length (x) 
if (diagonal) 

n <- (sqrt (1+8* k) -1) / 2 
else 

n <- (sqrt (1+8* k) +1) / 2 
if (n != as.integer (n)) 
stop ("input error") 
nn <- 1: n 

if (diagonal && lower) 
m <- outer (nn, nn, ">=") 
if (diagonal && (! lower)) 
m <- outer (nn, nn, "<=") 
if (([diagonal) && lower) 
m <- outer (nn, nn, ">") 
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if (((diagonal) && (! lower)) 
m <- outer (nn, nn, "<") 
b <- matrix (0, n, n) 
b[m] <- x 
b <- b + t(b) 
if (diagonal) 

diag (b) <- diag(b) / 2 
return (b) 


triangleFromSymmetric <- function (x, lower = TRUE, diagonal = TRUE) { 
n <- ncol (x) 
nn <- 1: n 

if (diagonal && lower) 
m <- outer (nn, nn, ">=") 
if (diagonal && (! lower)) 
m <- outer (nn, nn, "<=") 
if ((Idiagonal) && lower) 
m <- outer (nn, nn, ">") 
if ((Idiagonal) && (I lower)) 
m <- outer (nn, nn, "<") 
return (x [m]) 


7.3 mdsUtils.R 

library (mgcv) 

torgerson <- function (delta, p = 2) { 

z <- slanczos(-doubleCenter((delta 2) / 2) , p) 
w <- matrix (0, p, p) 
v <- pmax (z$values, 0) 
diag (w) <- sqrt (v) 
return (z$vectors %*% w) 


basisPrep <- function (n, p, w) { 
m <- n * (n - 1) / 2 

v <- -symmetricFromTriangle (w, diagonal = FALSE) 

diag (v) <- -rowSums(v) 

ev <- eigen (v) 

eval <- ev$values[1:(n - 1)] 

evec <- ev$vectors[, 1: (n - 1)] 
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z <- evec y,*y, diag (1 / sqrt (eval)) 
a <- array (0, c(n - 1, n - 1, m)) 
k <- 1 

for (j in 1: (n-1) ) { 
for (i in (j+1):n) { 
dif <- z [i,] - z[j ,] 
a [, , k] <- outer (dif, dif) 
k <- k + 1 

> 

} 

return (list (z = z, a = a)) 

} 

center <- function (x) { 

return (apply (x, 2, function (z) z - mean (z))) 

} 

doubleCenter <- function (x) { 
n <- nrow (x) 
j <- diag(n) - (1 / n) 
return (j 1*1 x 1*1 j) 

} 

squareDist <- function (x) { 
d <- diag (x) 

return (outer (d, d, "+") - 2 * x) 


7.4 smacof.R 

smacof <- 

function (delta, 

p = 2, 

xini = torgerson (symmetricFromTriangle (delta, diagonal = FALSE), p), 
w = rep (1, length (delta)), 
itmax = 1000, 
eps = le-10, 
verbose = FALSE) { 
m <- length (delta) 
n <- (1 + sqrt (1 + 8 * m)) / 2 
r <- p * (n - 1) 
h <- basisPrep (n, p, w) 
xold <- rep (0, r) 
for (s in l:p) { 


11 


k <- (s - 1) * (n - 1) + 1: (n - 1) 
xold[k] <- 

crossprod (h$z, xini[, s]) / diag (crossprod (h$z)) 

> 

d <- rep (0, m) 
itel <- 1 
sold <- Inf 

ssq <- sum (w * delta ~ 2) 
repeat { 

bmat <- matrix (0, n-1, n-1) 
xx <- matrix (xold, n - 1, p) 
for (k in l:m) { 
ak <- h$a[, , k] 

d[k] <- sqrt (sum (xx * (ak xx))) 

bmat <- bmat + w[k] * (delta[k] / d[k]) * ak 

> 

xnew <- as.vector (bmat xx) 
snew <- sum (w * (delta - d) ~ 2) / ssq 
if (verbose) 
cat ( 

"Iteration: ", 

formate (itel, width = 3, format = "d"), 

"sold: ", 
formate ( 
sold, 

digits = 8, 
width = 12, 
format = "f" 

), 

"snew: ", 
formate ( 
snew, 

digits = 8, 
width = 12, 
format = "f" 

), 

"\n" 

) 

if ((itel == itmax) II ((sold - snew) < eps)) 
break 

xold <- drop (xnew) 
sold <- snew 
itel <- itel + 1 
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xconf <- matrix (0, n, p) 
for (s in l:p) { 

k <- (s - 1) * (n - 1) + 1: (n - 1) 
xconf [, s] <- h$z y.*y. xnew[k] 

> 

return (list ( 
delta = delta, 
dist = d, 
x = xconf, 
itel = itel, 
stress = snew 

)) 

} 

7.5 smacofAbove.R 

smacofAbove <- 
function (delta, 

p = 2, 

xini = torgerson (symmetricFromTriangle (delta, diagonal = FALSE), p), 
w = rep (1, length (delta)), 
bnd = rep (0, length (delta)), 
itmax = 1000, 
eps = le-10, 
verbose = FALSE) { 
m <- length (delta) 
n <- (1 + sqrt (1 + 8 * m)) / 2 
wnd <- which (bnd > 0) 

1 <- length (wnd) 

xd <- as.vector (dist (xini)) 

lbd <- max (bnd[wnd] / xd[bnd]) 

xini <- lbd * xini 

r <- p * (n - 1) 

h <- basisPrep (n, p, w) 

xold <- rep (0, r) 

for (s in 1: p) { 

k <- (s - 1) * (n - 1) + 1: (n - 1) 
xold[k] <- 

crossprod (h$z, xini[, s]) / diag (crossprod (h$z)) 

> 

d <- rep (0, m) 
e <- matrix (0, 1, r) 
itel <- 1 
sold <- Inf 
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ssq <- sum (w * delta ~ 2) 
repeat { 

xmid <- matrix (0, n - 1, p) 
xx <- matrix (xold, n - 1, p) 
t <- 1 

for (k in l:m) { 
ak <- h$a[, , k] 
ax <- ak 1*1 xx 
d[k] <- sqrt (sum (xx * ax)) 
if (lis.na (match (k, wnd))) { 
e[t, ] <- as.vector (ax / d[k]) 
t <- t + 1 

} 

xmid <- xmid + w[k] * (delta[k] / d[k]) * ax 

> 

xnew <- qp (q = diag (r), p = -as.vector (xmid), e = e, f = bnd[wnd]) 
snew <- sum (w * (delta - d) ~ 2) / ssq 
if (verbose) 
cat ( 

"Iteration: ", 

formate (itel, width = 3, format = "d"), 

"sold: ", 
formate ( 
sold, 

digits = 8, 
width = 12, 
format = "f" 

), 

"snew: ", 
formate ( 
snew, 

digits = 8, 
width = 12, 
format = "f" 

), 

"\n" 

) 

if ((itel == itmax) II ((sold - snew) < eps)) 
break 

xold <- xnew 
sold <- snew 
itel <- itel + 1 

> 

xconf <- matrix (0, n, p) 


14 


for (s in l:p) { 

k <- (s - 1) * (n - 1) + 1: (n - 1) 
xconf[, s] <- h$z 1*1 xnew[k] 

> 

return ( 
list ( 

delta = delta, 
dist = d, 
x = xconf, 
itel = itel, 
stress = snew, 

constraints = d[wnd] - bnd[wnd] 

) 

) 

} 
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